PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

نویسندگان

Boxing Chen

Roland Kuhn

Samuel Larkin

چکیده

Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimization difficulty, they have not been widely adopted for tuning. This paper presents PORT 1 , a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems. PORT does not require external resources and is quick to compute. It has a better correlation with human judgment than BLEU. We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. PORT tuning achieves consistently better performance than BLEU tuning, according to four automated metrics (including BLEU) and to human evaluation: in comparisons of outputs from 300 source sentences, human judges preferred the PORT-tuned output 45.3% of the time (vs. 32.7% BLEU tuning preferences and 22.0% ties).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Customizable MT Evaluation Metric for Assessing Adequacy Machine Translation Term Project

This project describes a customizable MT evaluation metric that provides system-dependent scores for the purposes of tuning an MT system. The features presented focus on assessing adequacy over uency. Rather than simply examining features, this project frames the MT evaluation task as a classi cation question to determine whether a given sentence was produced by a human or a machine. Support Ve...

متن کامل

Removing Biases from Trainable MT Metrics by Using Self-Training

Most trainable machine translation (MT) metrics train their weights on human judgments of state-of-the-art MT systems outputs. This makes trainable metrics biases in many ways. One of them is preferring longer translations. These biased metrics when used for tuning are evaluating different types of translations – n-best lists of translations with very diverse quality. Systems tuned with these m...

متن کامل

MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences. Unlike most metrics, we compute a similarity score between items across the two sentences. We then find a maximum weight matching between the items such that each item in one sentence is mapped to at most one item in the other sentence. Th...

متن کامل

NEW CRITERIA FOR RULE SELECTION IN FUZZY LEARNING CLASSIFIER SYSTEMS

Designing an effective criterion for selecting the best rule is a major problem in theprocess of implementing Fuzzy Learning Classifier (FLC) systems. Conventionally confidenceand support or combined measures of these are used as criteria for fuzzy rule evaluation. In thispaper new entities namely precision and recall from the field of Information Retrieval (IR)systems is adapted as alternative...

متن کامل

Evaluation Techniques Applied to Domain Tuning of MT Lexicons

We describe a set of evaluation techniques applied to domain tuning of bilingual lexicons for machine translation. Our overall objective is to translate a domain-specific document in a foreign language (in this case, Chinese) to English. First, we perform an intrinsic evaluation of the effectiveness of our domain-tuning techniques by comparing our domain-tuned lexicon to a manually constructed ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

نویسندگان

چکیده

منابع مشابه

A Customizable MT Evaluation Metric for Assessing Adequacy Machine Translation Term Project

Removing Biases from Trainable MT Metrics by Using Self-Training

MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

NEW CRITERIA FOR RULE SELECTION IN FUZZY LEARNING CLASSIFIER SYSTEMS

Evaluation Techniques Applied to Domain Tuning of MT Lexicons

عنوان ژورنال:

اشتراک گذاری